************Replication Code for "Learning to Prediction Proliferation"**********


*****Discrimination: full sample******
sum correct if year<1974
sum correct if process==0&year<1974

sum correct if Predict_prolif==0&year<1974
sum correct if Predict_prolif==1&year<1974

*Finding modal_outcome = nonproliferation*
sum outcome_prolif if year<1974

*Comparing NIE predictions vs. predictions using modal outcomes*
ci mean correct if year<1974
ci mean correct_modal_prediction if year<1974
ttest correct == correct_modal_prediction if year<1974

****False positive vs. false negatives: full sample***
sum Predict_prolif if correct==0&year<1974


***Brier Scores: full sample***
gen error = (mandel_prob - outcome)^2
ci mean error if year<1974

gen baserate_error = (.37 - outcome_prolif)^2
ci mean baserate_error if year<1974

ttest error == baserate_error if year<1974


***Calibration: full sample***
collapse (mean) outcome (count) year if year<1974, by(mandel_prob)
gen deviation = (mandel_prob-outcome)^2
ci mean deviation [aweight=year]

*clear and reload data*

***Discrimination and false positives and negatives, by year***
graph bar correct if year<1974, over(year) scheme(s1mono) ytitle("Proportion Correctly Classified") title("Performance By Year") bar(1, color(blue)) blabel(bar)
tabulate correct y195758 if year<1974, chi2

graph bar over under if year<1974, over(year) scheme(s1mono) ytitle("Proportion of Estimates") title("Error Type, By Year") bar(1, color(blue)) bargap(50) 
tabulate over y195758 if year<1974, chi2
tabulate under y195758 if year<1974, chi2


****Brier Scores: by time period***
gen error = (mandel_prob - outcome)^2
ci mean error if y195758==1
ci mean error if y196066==1
ttest error, by(y195758) unequal

sum outcome_prolif if y195758==1
gen baserate_error5758 = (.419 - outcome_prolif)^2 if y195758==1
ci mean baserate_error5758

sum outcome_prolif if y196066==1
gen baserate_error6066 = (.359 - outcome_prolif)^2 if y196066==1
ci mean baserate_error6066

ttest error == baserate_error5758 if y195758==1
ttest error == baserate_error6066 if y196066==1


***Calibration: by time period***
collapse (mean) outcome (count) year if y195758==1, by(mandel_prob)
gen deviation = (mandel_prob-outcome)^2
ci mean deviation [aweight=year]

*clear and reload data*

collapse (mean) outcome (count) year if y196066==1, by(mandel_prob)
gen deviation = (mandel_prob-outcome)^2
ci mean deviation [aweight=year]

*clear and reload data*



***Accounting for improvement: discrimination***
graph bar correct if intel_target==1&year<1974, over(year) scheme(s1mono) ytitle("Proportion Correctly Classified") title("Performance by Year, Intel Targets") bar(1, color(blue)) blabel(bar)
graph bar correct if intel_target==0&year<1974, over(year) scheme(s1mono) ytitle("Proportion Correctly Classified") title("Performance by Year, Non-Targets") bar(1, color(blue)) blabel(bar) 

tabulate correct y195758 if intel_target==0&year<1974, chi2
tabulate correct y195758 if intel_target==1&year<1974, chi2


***Analysis of Initial Assessments***
graph bar correct if first_estimate==1&year<1974, over(year) scheme(s1mono) ytitle("Proportion Correctly Classified") title("Accuracy of First Assessments") bar(1, color(blue)) blabel(bar) 
tabulate correct y195758 if first_estimate==1, chi2



*************Appendix**************

***Performance by country assessed***
sort Country
by Country: sum correct if year<1974

***Regression models accounting for potential confounders: full sample***

reg correct y196066 Adversary Ally democracy autocracy disputes rivalry us_nca Energy Future if year<1974, cl(cowcc)
reg over y196066 Adversary Ally democracy autocracy disputes rivalry us_nca Energy Future if year<1974, cl(cowcc)
reg under y196066 Adversary Ally democracy autocracy disputes rivalry us_nca Energy Future if year<1974, cl(cowcc)
areg correct y196066 Adversary Ally democracy autocracy disputes rivalry us_nca Energy Future if year<1974, absorb(cowcc)


***Replicating results using Bleek codings***

*Discrimination and false positives/negatives*
sum Correct_b if year<1974
sum Correct_b if y195758==1
sum Correct_b if y196066==1

graph bar Correct_b if year<1974, over(year) scheme(s1mono) ytitle("Proportion Correctly Classified") title("Performance By Year") bar(1, color(blue)) blabel(bar)
tabulate Correct_b y195758 if year<1974, chi2

graph bar over_b under_b if year<1974, over(year) scheme(s1mono) ytitle("Proportion of Estimates") title("Error Type, By Year") bar(1, color(blue)) bargap(50)
tabulate over_b y195758 if year<1974, chi2
tabulate under_b y195758 if year<1974, chi2

*Brier Scores*
gen error_b = (mandel_prob - outcome_b)^2 if year<1974

ci mean error_b if year<1974

ci mean error_b if y195758==1
ci mean error_b if y196066==1
ttest error_b, by(y195758) unequal

*Calibration Index*
collapse (mean) outcome_b (count) year if year<1974, by(mandel_prob)
gen deviation_b = (mandel_prob-outcome_b)^2
ci mean deviation_b [aweight=year] 

*clear and reload data*

collapse (mean) outcome_b (count) year if y195758==1, by(mandel_prob)
gen deviation_b = (mandel_prob-outcome_b)^2
ci mean deviation_b [aweight=year] 

*clear and reload data*

collapse (mean) outcome_b (count) year if y196066==1, by(mandel_prob)
gen deviation_b = (mandel_prob-outcome_b)^2
ci mean deviation_b [aweight=year] 

*clear and reload data*


***Results only using outcome assessments***

*Discrimination and false positives/negatives*
sum correct if year<1974&process==0
sum correct if y195758==1&process==0
sum correct if y196066==1&process==0

graph bar correct if process==0&year<1974, over(year) scheme(s1mono) ytitle("Proportion Correctly Classified") title("Performance by Year, Outcome Assessments") bar(1, color(blue)) blabel(bar) 
tabulate correct y195758 if year<1974&process==0, chi2

graph bar over_b under_b if year<1974&process==0, over(year) scheme(s1mono) ytitle("Proportion of Estimates") title("Error Type, By Year") bar(1, color(blue)) bargap(50)
tabulate over y195758 if year<1974&process==0, chi2
tabulate under y195758 if year<1974&process==0, chi2

*Brier Scores*
gen error = (mandel_prob - outcome)^2
ci mean error if year<1974&process==0
ci mean error if y195758==1&process==0
ci mean error if y196066==1&process==0
ttest error if process==0, by(y195758) unequal


*Calibration index*
collapse (mean) outcome (count) year if process==0&year<1974, by(mandel_prob)
gen deviation = (mandel_prob-outcome)^2
ci mean deviation [aweight=year]

*clear and reload data*

collapse (mean) outcome (count) year if process==0&y195758==1, by(mandel_prob)
gen deviation = (mandel_prob-outcome)^2
ci mean deviation [aweight=year]

*clear and reload data*

collapse (mean) outcome (count) year if process==0&y196066==1, by(mandel_prob)
gen deviation = (mandel_prob-outcome)^2
ci mean deviation [aweight=year]


***Robustness to including 1974 NIE***
sum correct if year==1974

gen error = (mandel_prob - outcome)^2
sum error if year==1974

collapse (mean) outcome (count) year if year==1974, by(mandel_prob)
gen deviation = (mandel_prob-outcome)^2
ci mean deviation [aweight=year]

*clear and reload data*


***Explaining Change over Time, Supplementary Analysis***

*Brier scores by time period and country type*
gen error = (mandel_prob - outcome)^2
ci mean error if intel_target==1&y195758==1
ci mean error if intel_target==1&y196066==1

gen baserate_error5758 = (.419 - outcome_prolif)^2 if y195758==1
ci mean baserate_error5758

gen baserate_error6066 = (.359 - outcome_prolif)^2 if y196066==1
ci mean baserate_error6066


*Regression models: accounting for change over time*
reg correct y196066 Adversary Ally democracy autocracy disputes rivalry us_nca Energy Future if intel_target==1&year<1974, cl(cowcc)
reg correct y196066 Adversary Ally democracy autocracy disputes rivalry us_nca Energy Future if intel_target==0&year<1974, cl(cowcc)

areg correct y196066 Adversary Ally democracy autocracy disputes rivalry us_nca Energy Future if intel_target==1&year<1974, cl(cowcc) absorb(cowcc)
areg correct y196066 Adversary Ally democracy autocracy disputes rivalry us_nca Energy Future if intel_target==0&year<1974, cl(cowcc) absorb(cowcc)



***Calibration: raw data***
sort mandel_prob
by mandel_prob: sum outcome if year<1974

by mandel_prob: sum outcome if y195758==1
by mandel_prob: sum outcome if y196066==1

***Showing improvement for proliferation and nonproliferation predictions***
sum correct if Predict_prolif==0&y195758==1
sum correct if Predict_prolif==0&y196066==1

sum correct if Predict_prolif==1&y195758==1
sum correct if Predict_prolif==1&y196066==1

tabulate correct y195758 if Predict_prolif==0, chi2
tabulate correct y195758 if Predict_prolif==1, chi2


***Accounting for Specificity***
reg correct y196066 Adversary Ally democracy autocracy disputes rivalry us_nca Energy Future specific if year<1974, cl(cowcc)

reg correct y196066 Adversary Ally democracy autocracy disputes rivalry us_nca Energy Future specific if intel_target==1&year<1974, cl(cowcc)
reg correct y196066 Adversary Ally democracy autocracy disputes rivalry us_nca Energy Future specific if intel_target==0&year<1974, cl(cowcc)







